Intelligence against complexity: Machine learning for nonuniform temperature-field measurements through laser absorption

The effect of spatial nonuniformity of the temperature distribution was examined on the capability of machine-learning algorithms to provide accurate temperature prediction based on Laser Absorption Spectroscopy. First, sixteen machine learning models were trained as surrogate models of conventional physical methods to measure temperature from uniform temperature distributions (uniform-profile spectra). The best three of them, Gaussian Process Regression (GPR), VGG13, and Boosted Random Forest (BRF) were shown to work excellently on uniform profiles but their performance degraded tremendously on nonuniform-profile spectra. This indicated that directly using uniform-profile-targeted methods to nonuniform profiles was improper. However, after retraining models on nonuniform-profile data, the models of GPR and VGG13, which utilized all features of the spectra, not only showed good accuracy and sensitivity to spectral twins, but also showed excellent generalization performance on spectra of increased nonuniformity, which demonstrated that the negative effects of nonuniformity on temperature measurement could be overcome. In contrast, BRF, which utilized partial features, did not have good generalization performance, which implied the nonuniformity level had impact on regional features of spectra. By reducing the data dimensionality through T-SNE and LDA, the visualizations of the data in two-dimensional feature spaces demonstrated that two datasets of substantially different levels of non-uniformity shared very closely similar distributions in terms of both spectral appearance and spectrum-temperature mapping. Notably, datasets from uniform and nonuniform temperature distributions clustered in two different areas of the 2D spaces of the t-SNE and LDA features with very few samples overlapping.


Introduction
Laser absorption Spectroscopy (LAS) is a fundamental temperature measurement tool, which is extensively used in combustion diagnostics [1], emission detection and quantification [2,3]. Its relevance has been strengthened recently because of the realization that absorption by molecules in the terrestrial atmosphere plays a substantial role in terms of climate change [4]. This is a line-of-sight measurement that cannot yield spatially resolved information. Consequently, LAS application is often restricted to laminar flows which possess a priori known profiles along the light path. Based on an assumption of uniform temperature profile, two-color methods [5] and line reversal [6] have been developed and applied for average temperature measurement. Machine learning models, which can be regarded as surrogate models of aforementioned physics-based methods, have also recently emerged in the field [7,8]. Turbulent flows, which possess nonuniform and intensely unsteady temperature and concentration profiles are almost ubiquitous in practical applications, such as combustion in gas turbines [9] and scramjets [10]. To deal with such cases, some methods based on two-color pyrometry have been proposed. A robust idea is to do multiple two color measurements with different line pairs [11]. The multiple measurements can be used in order to calculate average temperature and imply the profile nonuniformity. However, the different combinations of multiple two-color measurements can lead to varying average temperature or nonuniformity level. Goldenstein et, al. [12] judiciously selected lines of water which presented signal strengths that were closely linear with temperature, so that the temperature measured by this two-color pyrometry, could be equaled to the average temperature along the light path. However, these lines can only be approximated to be linear temperature dependence in a limited temperature range.
In addition to two-color measurements, there are other methods which use more lines or even entire bands. Profile fitting [13,14], an upgraded line reversal, reconstructs the parameters of a profile shape function rather than iterating on temperature but with the constraint that the prior knowledge of the profile shape must be known. The Temperature Binning Distribution (TBD) method [15] does not need this prior knowledge, it retrieves temperature distribution bins by solving algebraic equation sets constructed by intensities of multiple lines at configured temperature bins, and the corresponding line intensities extracted from the experimental spectrum. However, the bins cannot be too finely divided, otherwise, a huge equation set must be solved, which may even be singular without unique solution. The Temperature Distribution Function method [16] is the most recent one, which utilizes the transformation of the spectrum and profile shape functions, but still, it is mainly available for simple and regular profile shapes.
In any case, the development of these methods has extended the feasibility of acquiring accurate temperature information along nonuniform profiles. It is also worth noting that these techniques do not provide any insights as to how profile nonuniformity affects the inaccuracy of the average temperature measurement. Understanding of this effect is most important, since temperature and concentration nonuniformity can affect spectral response in several interesting ways, two cases of which are shown in Fig 1. The first is a case of one uniform profile and one nonuniform profile that have the same average temperature and concentration, but can lead to different spectral appearances, as shown in Fig 1(a). According to this spectral discrepancy, applying the methods targeting uniform profile to nonuniform ones would introduce inaccuracy. The second case is when spectra have very closely the same measured spectrum, but are in fact generated from totally different profiles with distinct average temperatures, as shown in the Fig 1(b). This phenomenon is referred to as spectral twins. A question then rises as to whether spectra of twins can be distinguishable and estimated properly. Without enough understanding and consideration of the effect of nonuniformity on temperature measurement, which includes resolving the two situations that were considered above, it is difficult to provide accurate temperature measurements in non-uniform profiles.
Ideally, one would need to formulate an analytical model, which would illustrate how temperature measurement inaccuracy depends on level of nonuniformity, and, of course, the specific methodology of measurement. However, it is unrealistically difficult to construct a specific analytical model of error propagation according to the mechanism of every method, something that has been illustrated very clearly in [12], where the analysis of nonuniformity effect on the simplest two-color method was determined by the complex coupling between temperature and concentration.
Rather than formulating (perhaps impossible) analytical models, we propose to use machine learning models, in order to use artificial intelligence to explore natural complexity. The logic behind is that, although different physics-based temperature measurement methods can operate on several physical principles, their data utilization fashion can be simply divided into two types, i.e., using few lines (regional features), such as two colors; or using multiple lines/bands (global features), such as profile fitting. Therefore, by training machine learning models with different data-utilization algorithms, we can summarize various physical models into several machine-learning-based surrogate models, and approach the optimal performance of each data-utilization algorithm. Through analyzing the performance of several such machine learning models, the nonuniformity effect can be quantified. Moreover, although machine learning models are in principle physics-agnostic, it is also true that they can be linked to different types of physical methods [17]. By analyzing the performance of various machine learning models in the particular physical problem, we could introduce the machine learning algorithms to the physical models that they represent in order to acquire physical insight about the temperature nonuniformity.
In this study, first, machine-learning-based models are trained to measure temperature from uniform-profile spectra. Then, we show that the application of the best three of the sixteen tested models, namely, Gaussian Process Regression (GPR) [18], VGG-13 [19] and Boosted Random Forest (BRF) [20], to nonuniform-profile spectra introduces significant measure error and poor capability of spectral twin distinguishment. In order to address this issue, these three models are retrained on spectra that were acquired from non-uniform profiles and applied to spectra of different nonuniformities. It is shown that nonuniformities can be captured with acceptable accuracy by models which utilize global features. In order to understand the precise nature of the issues that non-uniformity causes, we used T-distributed Stochastic Neighbor Embedding (t-SNE) [21] and Linear Discriminant Analysis (LDA) [22], which allow representation of the data in spaces of low dimensionality.

Spectral data generation
CO2 absorption spectra were synthesized using the HITRAN Application Programming Interface (HAPI) [23] and HITEMP 2010 databases [24], where, the substance, light path length, temperature, and mole fraction (concentration) were used as inputs in order to generate a spectrum.
Irrespectively of whether spectra are generated for the uniform or nonuniform profiles, the substance type, total light path length, temperature range and concentration range were controlled to be identical so that only the effect of nonuniformity determined the form of the generated spectra. The light-path length was set to 10 cm; the temperature was assigned from a typical combustion temperature range of 600-2000 K; and the wavelength band chosen is 2375-2395 cm −1 , because the line intensity in this range is sensitive to temperature changes, as shown in Fig 2. As for the mole fraction range, we chose it to be in the range of 0.05-0.07, which was deliberately selected in order to generate effective spectral twins.
In order to generate nonuniform profiles, we divided the light path into five sections, and randomly set the temperature and mole fraction in each section from the ranges aforementioned. In order to generate effective spectral twins, we had to avoid generating two types of trivially identical spectra samples, i.e., samples from different profiles with the same average temperature, concentration, and spectrum. One such case is two profiles that have the same temperature and mole fraction distributions, but just the ordering of the path sections differs when the light path is uniformly divided. This can be seen from the Beer-Lambert law (Eq 1): Where, α, T, S, ϕ, P, x, and ΔL are absorptivity, temperature, line intensity, line shape function, pressure, mole fraction, and section length, respectively. The subscripts v and i indicate frequency and section index respectively. Since the summation operation does not convey sequential spatial information, changing the order of the sections will not be reflected in the spectra. An example is shown in Fig 3(a), the profile of sample 2 is the profile of sample 1 with the order of first two sections changed. The spectra corresponding to these different profiles are still exactly same. In order to avoid such intrinsic identical samples, we divide the light path into a geometric sequence with the ratio of 1.4. Thus, the relative length of the sections is [1, 1.4, 1.96, 2.74, 5.38]. With such nonuniform light path division, the section exchange cannot lead to same spectra, as expressed by Eq 1. However, this configuration is not sufficient to suppress a second type of trivially identical spectra, which have equivalent aggregated sections. In order to see how this can happen, consider Fig 3(b). With the exception of first two sections, the temperature and concentration profiles of the two samples are the same. Since the first two sections have equal temperature, and the average mole fraction in the aggregated section comprising the first two sections is equal to 0.05, the spectra generated will be identical. The trick to tackle this condition is to constrain the maximum ratio of the mole fraction between sections to be smaller than 1.4, the ratio of geometric sequence of section length. This is why the mole fraction in each section was selected between 0.05-0.07 (not including the value of 0.07).
After these precautions and in order not to generate trivially identical spectra, 10,000 samples of spectra were generated for uniform profiles and non-uniform profiles that comprised five sections of step-wise changing values of both mole fraction and temperature.

Spectral twin selection
In order to assess the inaccuracy caused by spectral twins, we first selected spectral twins from the samples we generated. An algorithm for such purpose was developed and its principle is as follows. For a given spectrum, its twin counterpart should satisfy two requirements: (1) be similar to the given spectra, which is judged by Mean Square Error (MSE) between spectra lines smaller than a similarity index criterion ε and (2) that among all candidates satisfy the first requirement, it has the largest temperature difference from the given spectrum.
Here, we selected three levels of similarity index criterion, i.e., � = 10 −3 , 10 −4 and 10 −5 , in order to gradually diminish the MSE between twin spectra. Correspondingly, three datasets of spectral twins, were generated. Table 1 summarizes the statistics of spectrum and spectral twins for these three datasets. When looking at the sample level, we observe that the average temperatures do not reach the configured boundaries, i.e., 600 and 2000 K. This is because in order to reach the boundary values of the temperature range, all five sections must have the boundary value, however, since the temperature was set independently in each section, reaching five boundary temperatures is a small probability event. Because temperature and concentration are set randomly, the average temperature of the nonuniform-profile data follows the Irwin-Hall distribution [25] for the distribution of the sum of n = 5 variables from multiple independent uniform distributions. Moreover, the stricter the criterion, the narrower the average temperature range that we can estimate. This is because when the similarity index criterion, i.e., MSE requirement, is stricter, it is harder for samples to find corresponding counterparts that will make up spectral twins. This is reflected in the decreasing percentage of twins detected. The spectra losing their twins mainly concentrate in both tails of the distribution. This is because they have fewer candidates that can be spectral twins, thus causing the shrinkage of temperature range of datasets of spectral twins. When examining the twin levels from Table 1, we see that the temperature difference between the reference spectra and their twin counterparts is decreasing while their correlations are increasing as the criteria are tightened, in terms of the changes in Absolute Error (AE), Root Mean Square Error (RMSE) and Pearson correlation coefficient (R). It is notable, however that, even at a similarity level of 10 −5 when the highest correlation between spectral twins is achieved, the maximum temperature difference between twins can still reach up to 216.9 K, with a median RMSE and AE of about 70 K, which reflects the fact that significant temperature difference still exists between the twins under these conditions. This points to the importance of correct detection of the twins in the training algorithms that will follow.

Machine learning model training
Machine-learning models were trained as surrogate models of conventional physical methods to predict the average temperature from uniform-profile spectra. Specifically, 10,000 uniformprofile samples generated as described above were divided into training, validation, and test sets, in the numbers of 7000, 1500, 1500 samples, respectively. In total, we trained 16 machine learning models to predict temperatures from spectra of uniform profiles, as shown in Table 2.
The models can be categorized in five classes according to the learning process, i.e., Support Vector Regression (SVR) [26], Gaussian Process Regression (GPR) [18], Ridge regression [27], Boosted Random Forest (BRF) [20] in XGBoost [28], and Neural Network (NN). In SVR, GPR, and Ridge regression, a number of kernels were evaluated, e.g., radial basis function (RBF), polynomial (Poly), linear and constant, which are available in Scikit Learn [29]. In the case of NN, in addition to the conventional Multi-Layer Perceptron (MLP) [30], a number of Convolutional Neural Network (CNN) architectures [31] were considered, i.e., VGG [19], ResNet [32], Squeeze Net [33] and Xceptron [34]. These CNN architectures were originally devised for image classification tasks, but we modified them to be suitable for 1D spectrum data and regression tasks, by changing to 1D convolutional layers and removing the operations of batch normalization and dropout. More details about the machine learning algorithms can be found in the references cited.
The specific machine learning methods were chosen based on a consideration of their fundamental way of utilizing data. For instance, SVR uses support vectors, i.e., a tiny subset of the training dataset, in order to construct a hyperplane; while GPR is a non-parametric algorithm, which uses all features of all the samples of the training set to construct a correlation matrix as the reference for the test samples. Ridge regression also uses the information of all samples, since it is in fact a type of regularized least squares regression. BRF is an ensemble learning method, which is constructed by multiple tree models [35], but each one only uses partial features of subsets of training samples. As for NN, except the MLP which uses the raw features of the samples, all CNN architectures learn new features from the raw ones using the features of the entire training set. In summary, SVR and BRF utilize regional information, but the former utilizes partial samples, while the latter utilizes partial features, similar to a two-color measurement. The remaining three algorithms, GPR, Ridge and NN utilize global information, all features of all samples, which carries some similarity to the line-reversal method of classical spectroscopy.
A variety of metrics were used in order to assess model performance, including RMSE, R, and the maximum, mean and median of the AE. By examining the performance of the models in Table 2, we observe that using different kernels or variants of a machine learning algorithm does not have a substantial impact on performance. Moreover, we can see that the top performing method in terms of median absolute error is GPR, followed by NN, then BRF, Ridge regression, and finally SVR. In what follows, we choose the top three models, i.e., GPR with constant kernel (called as GPR in what follows), BRF and VGG13 (called as VGG in what follows), as the representatives of these surrogate models of physical methods. At the same time, these selections also represent three different ways of utilizing data as discussed above. From the demonstrations of Fig 4, it is observed that all these three models provide accurate predictions of temperature from uniform-profile spectra without noticeable differences.

Generalization performance of models on nonuniform-profile data
We then applied GPR, VGG and BRF which were shown to be the top-performing algorithms of estimation of uniformly distributed temperature to spectra of nonuniform profiles comprising of five sections of randomly chosen temperature and mole fraction as explained above. The performance of these models is summarized in Table 3. By comparing Tables 2 to 3, it is evident that the performance of these models has dramatically degraded in the case of spectra of nonuniform profiles. It is worth noting that the maximum AE reaches up to 942 K, with its lowest value being at 615 K. The total temperature range is 2000-600 = 1400 K, which means that the maximum AE is at least 44% of the entire temperature range. When examining the  RMSE, mean AE and Median AE metrics, they are at least one order of magnitude higher than those of the spectra of uniform profiles, which is always above 10% of the temperature range of the available data. These quantitative results indicate that directly using these models for uniform profiles will cause significantly inaccurate measurements of temperature based on the spectra of nonuniform profiles. Although the models do not estimate the temperature accurately on nonuniform-profile data, they do capture trends in temperature correctly, as shown in Fig 5. This observation suggests that the nonuniform-profile data have some common properties with uniform-profile data. However, these two types of data also have substantial differences and as a result, the models performing well in predicting temperatures uniformly distributed in space cannot perform well in temperature estimations from non-uniform data.
Meanwhile, a very interesting observation is the performance rank of these three models on nonuniform-profile data, which is in fact in the inverse order of that of uniform-profile data, i.e., BRF > VGG > GPR. This may be due to two reasons. First, one model may overfit the uniform-profile data and underfit the nonuniform-profile data. Second, the machine learning method that uses partial features in an ensemble configuration is likely to be more robust than VGG which extracts new features from raw features. Clearly, GPR uses all the original features in the samples and is thus more sensitive to changes in the data, and thus demonstrates the least robust performance.

Generalized performance of models on spectral twins
Because of the importance of the correct detection of spectral twins that was pointed out in section 2.2., this was also examined for the case of non-uniform profiles. The metrics of Relative Error (RE), R, Percentage of consistency (P CST ), and Percentage of Closeness (P CLS ) were used to assess the performance of models on spectral twins. The definition of RE is as follows.
where T and b T are the ground truth and the prediction of average temperature, respectively; the subscripts k and j represent the reference spectrum and its counterpart for making up spectral twins, respectively.
In the best case, models should sense all individuals possessing the higher average temperature in the twin pairs and thus provide a corresponding higher temperature estimation. We defined P CST as the proportion in all estimations of spectral twins that satisfy this condition: Where s is the total number of spectral twins, and m denotes the spectral twin index, which ranges from 1 to s. c 1 computes the total number of spectral-twin estimations which satisfy Additionally, the estimation of an individual of the spectral twin should be closer to the actual temperature of this individual rather than that of its counterpart in the twins, which can be expressed as that j c T k;m À T k;m j j c T k;m À T j;m j < 1. Accordingly, we defined P CLS as shown in Eqs (5) and (6), which calculate the proportion of such cases among all pairs of spectral twins. Under the ideal condition, P CLS should also have a value of 1.
The performance of the models is summarized in Table 4. The distribution of RE is highly right-skewed, as indicated by its large positive skewness; therefore, the median and 3 rd quartile of RE are more statistically meaningful than the mean. In evaluating the results of Table 4, it is probably best to start with the "median" row, which provides a measure of the per-cent difference in average temperature that the several algorithms can "sense" as twins for a given similarity criterion. In this context, the larger this number, the better the performance of the algorithm in terms of twin detention, because then even twins with sizeable temperature differences can be detected. It is observed that the best result is realized by GPR on the dataset at ε = 10 −3 , however even there, half of the estimations of these models are only able to capture twins with temperature differences 32% or smaller. The stricter similarity criterion causes even weaker ability of the models to distinguish twins with sizeable temperature differences, for instance, at the level of = 10 −5 , the medians of RE dive to 0.14, 0.10 and 0.08 for GPR, VGG and BRF, respectively. Similar degrading performance trends are also observed in the metrics of R, and P CST , except in the case of P CST for GPR, which is increasing with a stricter criterion. P CLS is just above 0.5 on all conditions, so whether the estimation is closer or not to the reference spectrum is fairly random. Considering their previous poor accuracy in predicting average temperature from nonuniform-profile spectra, it is unsurprising that these models have poor performance on distinguishing spectral twins for spectra from non-uniform temperature distributions. In fact, as we showed in section 2.2., the sensitivity of distinguishing spectral twins and accuracy of temperature measurement are tightly correlated.
In addition, we can see that different algorithms have their strength on different metrics. GPR reaches largest values of Median and 3 rd quartile of RE, and smallest values of R, which demonstrates that it has the best performance in providing estimations that can distinguish between spectral twins, even when these have sizeable temperature differences whereas, BRF demonstrates a diminished capability in this regard. On the other hand, in terms of P CST , the ranking is reversed. This is because GPR is a very sensitive algorithm, which can detect slight differences between spectra. However, whether the estimations are right or not is a more complex matter. BRF, on the other hand, being an ensemble method, is more robust in detecting the temperature magnitude relationship between twins.
In Fig 6, we plot the performance of GPR, VGG, and BRF on the first 40 twins with the smallest and largest temperature difference at the similarity levels of 10 −3 and 10 −5 . The twins with the largest temperature difference often stay around the middle of the temperature range, i.e., 1300 K. The twins with the smallest temperature difference mainly lie near the boundaries of the temperature range. The reason behind this behavior is the same as the one responsible for the Irwin-Hall distribution of the average temperature that was discussed in Section 2.2. Specifically, a spectrum implying a moderate average temperature can have a temperature profile derived from either a section combination of consistently moderate temperatures or a combination of high and low temperature sections complementing each other. However, a spectrum implying an extremely low or high temperature can only have a combination of sections of consistently low or high temperature profile. Clearly, spectra implying moderate temperatures have a higher degree of freedom in profile composition and consequently are prone to be composed of spectral twins with large temperature differences. Therefore, the spectral twins with the smallest temperature difference are not as representative and meaningful as the ones with the largest temperature difference, since they are merely limited by the data generation method and thus cannot assess the performance of models.
What also stands out in Fig 6 is that the models often underestimate the average temperatures for both spectral twins with largest and smallest temperature differences. This trend is particularly clear, when the similarity level is constrained to the 10 −5 level. In addition, although models are capable to sometimes capture (to a limited degree) the contrast between spectral twins at the similarity level of 10 −3 , this frail sensitivity is eliminated by a stricter similarity criterion. such demonstrations match the observations from Table 4. Moreover, we can see that GPR is more sensitive to spectral feature changes than the other two machine learning methods, since which is prone to provide more contrastive temperature estimations of spectral twins, while BRF is the least sensitive one.

Dataset splitting for retraining models
The substantial negative impact of nonuniformity on the temperature measurement has been identified and quantified from the analysis of the previous section. The question arises as to whether this negative impact can be overcome. In order to address this issue, we retrained the models using nonuniform-profile spectra, with the expectation that the models can learn the spectrum-temperature mapping relationship of nonuniform-profile spectra and distinguish the differences between spectral twins.
The machine learning algorithms used were in the category of supervised learning, so that the labelling of spectra, i.e., average temperature, would navigate the learning direction of machine-learning models. As a result, the presence of spectral twins could lead to information leakage, in which the knowledge learnt from the samples in the training set could lead to incorrect predictions on the twins of these samples in the test set. To avoid this condition, we have to isolate the training and test sets.
For this reason, we applied a novel dataset splitting method, namely bidirectional isolation splitting, the principle of which is shown schematically in Fig 7. First, we duplicate the dataset of nonuniform-profile samples. The copies of the original spectra are represented as orange color boxes. Then, we split the samples into training set and test set with the partition of 70% and 30%, respectively. The dataset splitting boundary is represented by a black dash line in Fig  7. The links between spectral twins are represented by lines with arrows. To avoid information leakage, as long as a pair of spectral twins is composed of samples from both the training and the test sets (shown as red lines-irrespectively of whether a reference spectrum in training set has its counterpart in test set or vice versa), these samples will be deleted from both training and test sets, i.e., bidirectional deleting will take place. Eventually, both training and test sets are self-contained and mutually exclusive, i.e., all samples in the training set only have their counterparts in the training set and so do the samples in test set. Of course, this kind of isolation is not thorough since one sample may have several similar counterparts, but bidirectional isolation can forbid the strongest effect from the most similar counterpart, which is termed twin. After this initial dataset splitting, the second bidirectional isolated splitting was only applied to the test set in order to divide it into two subsets with almost same number of samples, which were then used as the final test and validation sets.
The statistics of test sets from different similarity levels are shown in Table 5. Compared to Table 1, statistics in Table 5 displays diminished temperature ranges, which is reasonable since we have deleted some samples during dataset splitting. Other metrics such as mean and median, basically similar to the values in Table 1.

Retraining and testing models on five-section-profile spectra
We retrained GPR, VGG and BRF, and summarized their performance in Table 6. In terms of sample level, all the models have decent performance on test sets, although it is not as good performance as on uniform-profile spectra shown in Table 2. However, the performance is substantially improved in terms of accuracy compared to the generalization performance of models trained on uniform-profile spectra, which was shown in Table 3. This implies that estimating accurate average temperatures from nonuniform-profile spectra is more difficult than estimating those from uniform-profile spectra, which is not surprising, but it also confirms that estimating average temperature from nonuniform-profile spectra is feasible, and the performance is acceptable.
As for the performance in terms of distinguishing between twins, which is also presented in Table 6, there is also substantial improvement in comparison to Table 4. This is again expected because of the close relation of the accuracy of the algorithms under consideration to the capability of the algorithms to distinguish between spectral twins. If we examine the effect of similarity levels, in addition to R and RMSE values, which are affected by the test set's similarity level, almost all other metrics in the twin level are also weakened, demonstrating that a stricter similarity criterion makes it more difficult to distinguish spectral twins. However, a higher similarity requirement reduces the estimation AE at the sample level, which is due to a stricter similarity level of samples forces models to figure out the details of spectral samples, as in adversarial learning [36].
In terms of median absolute error, the algorithms rank as GPR< VGG< BRF, which is the same ranking as the one observed in Table 2. This proves again that GPR, the algorithm that uses all features of the entire dataset, is most sensitive to slight differences between samples. On the contrary, BRF is the most insensitive one, since it cannot detect the subtle differences between spectral twins. As a result, the absolute error of BRF increases significantly when the similarity level changed to 10 −5 from 10 −4 . This is also supported by the results of Fig 8, which show that the predictions of BRF do not always coincide with the ground truth of spectral twins.

Increase nonuniformity
So far, we have examined the capabilities of the various algorithms with non-uniform profiles of five step-wise levels. The ability of the above models to achieve good generalization performance further than this artificial level of non-uniformity will determine whether the effect of nonuniformity can be truly overcome and the models trained can be regarded as potentially universal tools.
As a more complex profile compared to five-section profile, a ten-section profile was employed, the configurations of temperature, concentration, etc. were still set as before, but the light path is discretized uniformly rather than geometrically. 500 samples of this kind of profile were generated in order to test the already trained models. Samples from both five-section and ten-section profiles which have same average temperatures but different average concentrations are plotted in Fig 9. The spectra of these two types of profiles are apparently different, which is the combined result of different temperature nonuniformity and the concentration distribution.
We applied the models trained on five-section-profile data to ten-section profile spectra, and the performances of the models are summarized in Table 7. Comparing with the data of Table 6, the very encouraging (to an extent unanticipated) result is that the performance of both GPR and VGG is identical and not affected by the nonuniformity change, as also shown in Fig 10. On the other hand, the performance of BRF has significantly degraded. Also, consistently with its generalization performance on five-section-profile spectra (Fig 5), BRF underestimates the average temperatures of the spectra. The differences between model performance reveal that different nonuniformity levels cause regional differences in spectral appearance, but these regional differences are not strong enough to confuse the estimations of VGG and GPR. This is because GPR and VGG utilize all features of the distribution under consideration, while BRF utilizes multiple partial features. We could then hypothesize that the conventional physical methods which also utilize regional information, such as the two-color method, also suffer performance degradation when used without substantial consideration of the impact of nonuniformity. Table 8 summarizes the generalization performance of the models on uniform-profile spectra, which are considered as a case of extremely low non-uniformity. The AE medians are relatively low except ones of BRF, but notably they are higher than their counterparts in Table 7. Because the median is not affected by outliers, i.e., extreme high estimation errors, therefore, the decent median proves that models of VGG and GPR still maintain excellent performance on uniform-profile spectra, although worse than their performance on ten-section-profile spectra. However, GPR and VGG acquire much higher AE maxima than those in Table 7, which can be higher than 1000 K and cause significant increase in mean AE, and RMSEs. The reason for this can be seen using Fig 11, which shows that VGG and GPR work pretty well on most samples, except for the cases of which the average temperatures are close to the boundary of the examined temperature range, e.g., 2000 K. The nonuniform-profile training set does not contain samples with such high-temperatures, as shown in Table 5. Therefore, the increased AE maxima, AE means, and RMSEs only prove that VGG and GPR are not good at extrapolation, although they demonstrate excellent interpolation performance. Fig 11 also shows that BRF Table 7. Generalization performance of the retrained models on ten-section-profile spectra.

Model
Twin  has relatively good performance in extrapolation but its prediction accuracy on most samples is unsatisfactory.
In closure, the generalization performance of GPR and VGG on both uniform or more nonuniform-profile samples proves that the negative effect of nonuniformity can be overcome regardless of the nonuniformity level of spectra. GPR and VGG, which use global features, are promising tools in order to perform spectral measurements of temperature from spatially nonuniform temperature distributions, with the condition that the prediction should be interpolation rather than extrapolation.

Non-uniformity effect visualization
Although the examination of the performance of several algorithms have demonstrated that VGG and GPR can maintain decent performance with different degrees of nonuniformity, the reason behind this finding has not been shown in the previous analysis. In order to explore  such reasons, we used dimensionality reduction technologies to map the high dimensional spectra into two Dimensional (2D) features and then visualized the data in a two-dimensional space of features in order to obtain intuitive insights into the phenomenon. Two dimensionality reduction technologies were attempted, they are t-distributed Stochastic Neighbor Embedding (t-SNE) [21] and Linear Discriminant Analysis (LDA) [22]. T-SNE is an unsupervised algorithm, which means no label information is provided during mapping high dimensional spectra to 2D features. T-SNE has the advantage of being able to maintain the relative distance relationship between samples after dimensionality reduction, thus allowing for the observation of structure of dataset distributions. LDA is a supervised algorithm. By utilizing the information of dataset belongings of spectral samples, it offers the possibility to amplify the differences between datasets of varying nonuniformity. More details of the algorithms can be found in the references cited therein.
T-SNE was first performed on randomly picked 500 samples from three datasets of different nonuniformities (i.e., uniform profiles, as well as profiles with five and ten different temperature levels). The high dimensional space that spectra sitting in was transformed to a 2D space, and the spectra were mapped to vectors comprised by two values. These 2D vectors (called as features in what follows) are then further plotted into Fig 12(a). Notably, the samples of fivesection profiles and ten-section profiles are distributed in the same area in the 2D space of features, even though they have distinctive nonuniformity levels and light path divisions. The distribution of uniform-profile samples only has partial overlap with the distributions of nonuniform-profile samples, especially samples from five-section profiles.
This overlap between uniform-and nonuniform-profile datasets is almost eliminated in the results of the supervised LDA algorithm, as shown in Fig 12(b). The remained overlapping points are far from the cluster of nonuniform profile dataset but inside the cluster of uniform profile dataset. This phenomenon in fact implies a special case of nonuniform profile that uniform profile can be generated from individually random setting of discretized sections, despite it is a small probability event. Meanwhile, it is noted that, even when processing with the strongly label-led LDA, five-section and ten-section-profile spectra datasets are still tightly matched. This phenomenon tells that these two nonuniform datasets share closely similar properties of in terms of spectral forms.
Apart from the correlation between datasets, it is more crucial to understand the correlation of spectra-temperature mapping relationship between datasets. Inspired by the idea of locating patch in Visual Transformer (ViT) [37], we injected temperature information into the spectral information by adding normalized temperature into every normalized line intensity of the spectra; therefore, the aggregated "spectra" contain the information of both raw spectra and their corresponding temperature information.
The updated feature maps of t-SNE and LDA are respectively corresponding to Fig 12(c) and 12(d). Comparing two feature maps generated by t-SNE, i.e., Fig 12(a) and 12(c), we see that the overlapping of uniform and nonuniform samples is reduced but still exists. The vanishing overlaps in fact correspond to the concept of spectral twins. However, the feature maps of the LDA , Fig 12(b) and 12(d), do not differ substantially, which is due to dataset belongings of samples have already been utilized in supervised LDA operation, the further detailed temperature information is somehow redundant. Moreover, it is noted that the datasets of fivesection profile and ten-section profile remain in close vicinity in the reduced dimensionality space of either t-SNE or LDA. These observations indicate that two strongly nonuniform-profile datasets do not only have very similar spectra appearances but also have very similar spectra-temperature mapping relationship. This is the reason why VGG and GPR demonstrate good generalization performance on data sets of varying nonuniformity. In the meanwhile, the evident distinction between uniform-and non-uniform-profile datasets explains why models trained on uniform-profile datasets do not generalize well to non-uniform-profile datasets. Conversely, models trained on nonuniform profile datasets, on the other hand, can generalize decently to uniform profile datasets. This is because uniform profile can be generated with a small probability in a random setting of five sections, as discussed above, which is why a few points of nonuniform-profile dataset leave their own dataset center but overlap the points in the cluster of uniform-profile datasets.

Discussion
In this work, we provided quantitative results and associated discussions in terms of the effect of spatial non-uniformity with respect to the parameters of temperature and concentration distribution on the accuracy of the average temperature measurement estimate using laser absorption spectroscopy. The focus of the research was to explore the feasibility whether machine learning methods could provide reliable parameter estimates by overcoming this kind of effect.
Indeed, some earlier studies [11,13,15] provided the means to measure average temperature from absorption spectra subject to the presence of non-uniformity. However, they neither provided a systematic study of the effects of the non-uniformity, nor attempted to define its impact on the accuracy of the mean temperature measurements. For instance, in [12], there was an attempt to formally analyze and explain the mechanism of the effect of non-uniformities in temperature and distribution concentration. The outcomes of this work have merit, however, due to the complexities of the associated theoretical analysis in the simultaneous presence of non-uniformities in both temperature and concentration, the associated conclusions were not clear. In this research, we pursue a fundamentally different approach. The essence of the underlying concept is to use multiple machine-learning-based surrogate models, where each model was trained on uniform-profile data, to estimate the average temperature from nonuniform-profile data, and next, apply data analysis tools to quantify the effect of nonuniformity. Based on quantitative statistics, we confirmed that using temperature measurement tools for uniform profiles directly on spectra generated from non-uniform profiles introduces significant errors.
In addition, to the best of our knowledge, this is the first work, which attempts to overcome non-uniformity effects in laser absorption spectra by machine-learning-based solutions. The results demonstrate that some machine-learning-based solutions, based on global information, demonstrate robustness to changes in the magnitude and style of non-uniformity in temperature and concentration distributions, which could potentially solve the problem of measuring average temperature from laser absorption spectra, in the case of nonuniform profiles.

Conclusion
In this study, we utilized machine learning to quantify and understand the negative effect of nonuniformity on LAS temperature measurement. The conclusions can be drawn as follows: Although various machine-learning surrogate models of physical methods perform very satisfactory temperature prediction on spectra from areas of uniform temperature, they deteriorate significantly on nonuniform-profile spectra with huge measurement errors and negligible capability of distinguishing spectral twins. These results demonstrated that directly using uniform-profile-targeted methods to nonuniform-profile spectra is improper.
GPR and VGG retrained on five-section-profile spectra generalize well on spectra from different nonuniformities, whereas BRF cannot. This demonstrates the nonuniformity effect can be overcome by machine learning algorithms which use global features of the acquired spectra, something that is true for GPR and VGG, but not for BRF.
The good generalization performance of GPR and VGG was shown not to be affected by the degree of non-uniformity in the modelled spectra. This was rationalized by mapping the data in spaces of reduced dimensionality through t-SNE and LDA, which indicated that strongly nonuniform-profile spectra share the very similar properties in terms of both spectral appearance and spectrum-temperature mapping in the context of the two features that t-SNA and LDA generated. In the low-dimensionality spaces and with the exception of very few overlaps, non-uniform-profile datasets were shown to possess distinct values of features from uniform-profile datasets, which is the reason why attempting generalization on spectra from nonuniform temperature distributions with algorithms that were trained on spectra from uniform profiles can lead to substantial inaccuracies.